Stability-based cluster analysis applied to microarray data
نویسندگان
چکیده
This paper studies the estimation of the number of clusters using the so-called stability-based approach, where clusters obtained for two subsets of the data set are compared via a similarity index and the decision regarding the number of clusters is taken based on the statistics of the index over randomly selected subsets. We introduce a new similarity index , and analyze the consistency of the estimator of the number of classes when -means algorithm is used in conjunction with . Various similarity indices are experimentally evaluated when comparing the “true” data partition with the partition obtained at each level of an hierarchical clustering tree. Finally, experimental results with real data are reported for a glioma microarray dataset.
منابع مشابه
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses
OBJECTIVE Clustering algorithms may be applied to the analysis of DNA microarray data to identify novel subgroups that may lead to new taxonomies of diseases defined at bio-molecular level. A major problem related to the identification of biologically meaningful clusters is the assessment of their reliability, since clustering algorithms may find clusters even if no structure is present. METH...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملExtracellular exosomes and preeclampsia: a microarray-based study and functional enrichment analysis
Background: Preeclampsia (PE) is a heterogeneous pregnancy disease which the exact pathophysiology of it is unknown. Recently exosomes have been indicated as a causative factor in the pathogenesis of PE. The aim of the study was to investigate in microarray library data to extract the differentially expressed genes (DEGs) in PE and to perform a functional enrichment analysis to predict the rol...
متن کاملGraph-based consensus clustering for class discovery from gene expression data
MOTIVATION Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stabili...
متن کامل